Grapheme-to-Phoneme Models for (Almost) Any Language
نویسندگان
چکیده
Grapheme-to-phoneme (g2p) models are rarely available in low-resource languages, as the creation of training and evaluation data is expensive and time-consuming. We use Wiktionary to obtain more than 650k word-pronunciation pairs in more than 500 languages. We then develop phoneme and language distance metrics based on phonological and linguistic knowledge; applying those, we adapt g2p models for highresource languages to create models for related low-resource languages. We provide results for models for 229 adapted languages.
منابع مشابه
Integrating Thai grapheme based acoustic models into the ML-MIX framework - for language independent and cross-language ASR
Grapheme based speech recognition is a powerful tool for rapidly creating automatic speech recognition (ASR) systems in new languages. For purposes of language independent or cross language speech recognition it is necessary to identify similar models in the different languages involved. For phoneme based multilingual ASR systems this is usually achieved with the help of a language independent ...
متن کاملConversion from phoneme based to grapheme based acoustic models for speech recognition
This paper focuses on acoustic modeling in speech recognition. A novel approach how to build grapheme based acoustic models with conversion from existing phoneme based acoustic models is proposed. The grapheme based acoustic models are created as weighted sum from monophone acoustic models. The influence of particular monophone is determined with the phoneme to grapheme confusion matrix. Furthe...
متن کاملStatistical Grapheme to Phoneme Conversion using Language Origin
This report describes a method for grapheme to phoneme conversion using statistical models of pronunciation. The available techniques for this conversion are first described and examples of each are given. A baseline system which uses Hidden Markov Models to represent phonemes in English is described and evaluated. The results from the baseline system serve to replicate previous research and to...
متن کاملInvestigations on joint-multigram models for grapheme-to-phoneme conversion
We present a fully data-driven, language independent way of building a grapheme-to-phoneme converter. We apply the joint-multigram approach to the alignment problem and use standard language modelling techniques to model transcription probabilities. We study model parameters, training procedures and effects of corpus size in detail. Experiments were conducted on English and German pronunciation...
متن کاملMachine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information
Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on ei...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016